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In order to transmit biochemical signals, biological regulatory systems dissipate energy with con¬ 
comitant entropy production. Additionally, signaling often takes place in challenging environmental 
conditions. In a simple model regulatory circuit given by an input and a delayed output, we explore 
the trade-offs between information transmission and the system’s energetic efficiency. We determine 
the maximally informative network, given a fixed amount of entropy production and delayed re¬ 
sponse, exploring both the case with and without feedback. We find that feedback allows the circuit 
to overcome energy constraints and transmit close to the maximum available information even in 
the dissipationless limit. Negative feedback loops, characteristic of shock responses, are optimal at 
high dissipation. Close to equilibrium positive feedback loops, known for their stability, become 
more informative. Asking how the signaling network should be constructed to best function in the 
worst possible environment, rather than an optimally tuned one or in steady state, we discover that 
at large dissipation the same universal motif is optimal in all of these conditions. 

PACS numbers: 


I. INTRODUCTION 

Cells respond to the current state of their environ¬ 
ment by processing external signals through molecular 
networks and cascades. An external chemical stimulus 
is measured by receptors, which activate a series of bio¬ 
chemical reactions and lead the cell to produce an appro¬ 
priate response. This response can be activating a gene 
or pathway, producing proteins that process the signal 
as in the case of sugar metabolism, result in motion such 
as in the case of chemotaxis, or initiating a cellular re¬ 
sponse such as apoptosis. As we learn more about the 
structure of biochemical networks, we need to understand 
the functional role of their elements and connections. Yet 
regulation comes at a cost, which imposes constraints on 
the form of these networks. Here we consider the limi¬ 
tations coming from thermodynamic constraints, caused 
by the cell’s energy consumption, on the architecture of 
regulatory elements that best convey information about 
input signals to their outputs. We compare these most 
informative network structures to circuits that transmit 
the largest amount of information in unfavorable envi¬ 
ronmental conditions. 

Despite the large complexity of biological regulatory 
networks, not all possible molecular regulatory circuits 
can be found in living organisms [T] . One can ask whether 
the network architectures and parameter regimes are only 
shaped by the evolutionary history of these organisms, 
or whether there are also physical limits that constrain 
them. In the last years, a number of groups have ex- 
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plored different physical principles that could influence 
the parameter regimes and modes of regulation in living 
organisms (e.g. [MSI])- One approach has been to calcu¬ 
late the limits that the intrinsic randomness in gene reg¬ 
ulation imposes on information transmission between the 
input signal and its output responses, in networks of vary¬ 
ing complexity [TTJ [22H28] . These studies showed which 
network architectures are optimal for information trans¬ 
mission and found that distinguishing different output 
states in general increases the transmitted information. 
They also pointed to the important trade offs between 
the information that the output has about the input and 
molecular costs. 

The validity of the assumption that biochemical regu¬ 
latory networks are maximally transmitting information 
between the concentrations of their input and output pro¬ 
teins was tested by Tkacik et al. [Mj in the case of the 
Bicoid morphogen gradient. Bicoid proteins regulate the 
expression of the hunchback gene in early fruit fly de¬ 
velopment. Using detailed measurements of the concen¬ 
tration and noise profiles of the Hunchback protein as a 
function of Bicoid concentration EOIEI], the prediction 
for the probability distribution of the output concentra¬ 
tion obtained by maximizing the flow of information was 
shown to match the experimental Hunchback distribu¬ 
tion extremely well. In another combined experimen¬ 
tal and theoretical study, Cheong et al. |32| measured 
the amount of information transmitted to NF-kappa B 
controlled genes in the case of TNF stimulation. They 
showed how bottlenecks in this system reduce the amount 
of transmitted information compared to regulation via 
multiple independent pathways. They argued that nega¬ 
tive feedback, or information sharing between cells, can 
help transmit more information. The NF-kappa B and 
ERK pathways were recently used to demonstrate that 
dynamical measurements of the response can transmit 
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more information than static or one time readouts |33| . 
Lastly, an information-theoretic approach was used in an 
experimental and numerical study to show the interde¬ 
pendence of stochastic processes controlling enzymatic 
calcium signaling under different cellular conditions |34j . 

Many of the current approaches to information trans¬ 
mission have looked at instantaneous information trans¬ 
mission [n m or the rate of information trans¬ 
mission da mEii Eg. However, it has been argued 
that information transmission may be enhanced by dy¬ 
namic biochemical readouts at multiple time points |33j 
or when the regulatory response is at a delay relative 
to input signaling m- Additionally, many biochemical 
networks function out of steady state, responding to in¬ 
puts that are changing in time. Examples include the 
chemotactic response of bacteria or amoebas to nutrients 
or conversely to antibiotics. 

Inspired by these observations, we previously studied 
the optimal circuits for transmitting information between 
an input and output read out with a fixed delay, in and 
out of steady state [38] . Delayed readouts are natural to 
most biochemical circuits, since sensing a signal requires 
production of the response, which takes time. For exam¬ 
ple, sensing an increased sugar concentration means the 
cell has to produce the enzyme to degrade it. We asked 
whether different readout delays correspond to different 
optimal circuits. We found that topologies of maximally 
informative networks correspond to commonly occurring 
negative feedback circuits irrespective of the temporal 
delay specified. Most interestingly, circuits functioning 
out of steady state may exploit non-equilibrium absorb¬ 
ing states to transmit information optimally and feed¬ 
back can additionally increase information transmission. 
We found that there are many degenerate topologies that 
transmit similar information equally optimally - a degen¬ 
eracy that will most likely be lifted by considering more 
detailed molecular models. 

The optimal solutions we found previously function 
strongly out of equilibrium, so they must consume en¬ 
ergy. Since it has been experimentally shown |39j that 
sensory systems may have evolved to reduce their energy 
expenditure, we were interested in seeing how energetic 
constraints impact the form of the optimally informa¬ 
tive solutions. This knowledge will prove useful when 
constructing artificial biochemical circuits |3D], or engi¬ 
neering living organisms for energy production |41| . The 
energy dissipated (or consumed) by a given network can 
be estimated by looking at the thermodynamics of its 
composite biochemical reactions. A completely reversible 
reaction does not consume energy. The reaction is in per¬ 
fect equilibrium and the total free energy of the system 
is completely balanced. Irreversible reactions, such as 
certain steps of biochemical cascades, come at a cost to 
the cell, which has to prevent the back reaction from oc¬ 
curring. This cost can be estimated considering the flux 
balance of the network. The heat dissipated by the cir¬ 
cuit is proportional to its rate of entropy production |42j . 
Tu et al. [43] looked at entropy production in biochemi¬ 


cal regulatory networks and experimentally showed that 
the flagellar motor switch of Escherichia coli operates 
out of equilibrium, dissipating energy. A nonequlibrium 
allosteric model consistent with experimental results was 
proposed to explain how the switch operates with high 
sensitivity at a small energetic cost. 

Energetic cost has also been discussed in relation to 
cellular precision and the predictive power of the cell. 
The chemosensory system of E. coli has been shown to 
dissipate energy in order to improve its adaptive speed 
and accuracy |44j . Reliable readout of input concentra¬ 
tions has also been bound by the entropy production rate 
PKIHg . Others have reversed the perspective and shown 
that the minimum energy required for a biological sensor 
to detect a change in an environmental signal is propor¬ 
tional to the amount of information processed during the 
process [50]. In the case of the E. coli chemosensory 
system, it was argued that 5% of the energy consumed 
in sensing is determined by information-thermodynamic 
bounds, and is thus unavoidable m- Becker et al. m 
showed that short-term prediction in a sensory module is 
possible in equilibrium, but only up to a finite time inter¬ 
val. For longer times accurate prediction requires large 
dissipation. Lastly, the inability of systems to use all 
knowledge of past environmental fluctuations to predict 
the future state has been directly linked to dissipation 

m- 

We want to see how the structure of optimal net¬ 
works for information transmission changes if we impose 
a penalty on the entropy production of the system. In 
order to investigate the non-equilibrium nature of bio¬ 
chemical circuits that are optimal for delayed information 
transmission, we choose to study a simple binary model 
of a regulatory circuit that allows us to focus on the reg¬ 
ulatory logic at small computational costs. Within this 
model we consider two interacting elements of biochem¬ 
ical regulatory networks (e.g. proteins and genes, ele¬ 
ments of two component signaling systems, sugars and 
enzymes) that take on binary states (on or off) and 
evolve in continuous time. This simplification allows us 
to develop an efficient formalism for calculating informa¬ 
tion transmission at different readout delays and consider 
the connection between dissipation and different readout 
times. In the limit of infinite dissipation rates, we re¬ 
cover the previously obtained results |38j . For finite, non 
zero dissipation rates, back reactions decrease the infor¬ 
mation transmission until it goes to zero for systems close 
to equilibrium. However, when feedback is allowed, net¬ 
works are able to transmit almost 1 bit of information at 
no cost. 

Optimizing biochemical networks for information 
transmission assumes that the circuit and its environ¬ 
ment have coevolved to best match their statistical prop¬ 
erties. For many networks this is a valid assumption. 
However, other networks function in a wide variety of 
variable conditions. To study what kind of network is 
best adapted to function in adverse environments we 
combine a game-theoretic maximin approach with the 


3 


framework of information theory. We ask what sys¬ 
tem will maximally transmit information even when pre¬ 
sented by the environment with the worst possible initial 
state - the one that aims at minimizing information at 
all time delays. Interestingly, we find that, even if the 
amount of transmitted information is inevitably smaller, 
the structure of the optimal circuits is the same as when 
the environment has no detrimental effect and the system 
is able to optimize its initial condition. 

Game-theoretic approaches have been used to robustly 
design biochemical networks and to devise biomimicking 
algorithms. Given environmental disturbances and un¬ 
certainty about the initial state, minimax strategies were 
used to match therapeutic treatment to a prescribed im¬ 
mune response [53) . and to make a stochastic synthetic 
gene network achieve a desired steady state [53]. The 
adaptive response in bacterial chemotaxis has been in¬ 
terpreted as a maximin strategy that ensures the highest 
minimum chemoattractant uptake for any profile of con¬ 
centration pp] . 

In the first section of this paper we discuss the effect 
of energetic constraints on information transmission at 
a time delay. We consider the case in which the system 
is at steady state and the signal up-regulates (or down- 
regulates) the response with and without feedback. In 
the second section we investigate how the system coun¬ 
teracts the worst possible initial condition presented by 
the environment in order to transmit as much informa¬ 
tion as possible. We finish with a discussion of our results 
and their interpretation in terms of biochemical regula¬ 
tory networks. 


II. INFORMATION TRANSMISSION WITH 
ENERGY DISSIPATION 

To focus on the tradeoffs between the ability of the 
network to transmit information and the energetic cost 
of the biochemical reactions that make up this network, 
we study the simplest model of regulation that allows 
us to focus on the logic of the reactions. Our simplified 
network (see Fig. consists of two binary elements that 
describe either a transcription factor protein regulating a 
gene, or a signaling molecule activating/downregulating 
an enzyme or receptor. The first element of the network 
describes the input z and can be associated with the 
state of a receptor, signaling molecule or transcription 
factor that responds to the external conditions. For ex¬ 
ample, it can describe the presence or absence of a sugar 
source in metabolism or phosphorylation of the histidine 
kinase in a two component signaling system. The output 
X describes the final outcome of the network, such as the 
gene that produces the response protein to the external 
signal. In the examples given above, it corresponds to 
the enzyme that digests the sugar or expression of the 
target gene by the response regulator. Both of these ele¬ 
ments can be found in the active (x, z = -1-1) or inactive 
(x, z = — 1) states. If the described element is a continu- 
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FIG. 1: (a) Time evolution of the random variable Zt, which 
models a biochemical input transitioning from/to a down- 
regulated state (—1) to/from an up-regulated state (-fl), with 
rates {um,Up}/{dm,dp}, respectively. The random variable 
xt models activation (-1-1) or deactivation)—1) of a biochem¬ 
ical output: it is regulated by z, with which it aligns (‘acti¬ 
vation’, or up-regulation) with rates or Xp or anti-aligns 
(‘repression’, or down-regulation) with rates Sm or Sp. The 
subscripts m and p in the rates account for the state of the 
other variable, that is —1 and -fl, respectively, (b) The four 
network states, with corresponding transition rates given in 
(a). 


ous variable (e.g. protein concentration), the binary ap¬ 
proximation is equivalent to taking very steep regulatory 
functions, such that the concentration is well described 
by two states: below and above the threshold. 

This two component system can be found in one of 
four states: (x, z) G {(-,-), (-, -b), {+, -), {+, -b)}, cor¬ 
responding to both elements inactive, the input active — 
output inactive and vice versa, and both elements active. 
The input z up/down regulates x with rates rm(xp) and 
Sm(sp), defined in Fig.[^ that depend on the state of the 
input (m = —,p = +). The state of the system is defined 
by the conditional probability P{xt, Zt,t\xo, Zq, 0) to find 
the system in state (xt,Zt) at time t, conditional on the 
state (xo, Zq) at time t = 0. This conditional probability 
distribution can be arranged in a 4 x 4 matrix, and its 
evolution is described by the master equation 

dtP = -CP (1) 

where the 4x4 transition matrix C is defined in terms of 
the rates depicted in the diagram in Fig. (see Appendix 
0. The central quantity we shall be interested in is the 
joint probability distribution of the state x* of the output 
at time t and the state zq of the input at time 0. We shall 
use the shorthand 

P{xt,zo)= ^ P{xt,Zt,t\xo,Zo,0)Po{xo,zo) (2) 

Zt,Xo=±l 

where Po{xo, zq) is the probability distribution of the sys- 
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tern at the initial time. 

We are interested in finding the network topologies 
that are optimal for information transmission over a fixed 
time scale t. Specifically we want to maximize the mu¬ 
tual information between an input signal at an initial 
time zq, and the output of the network which is read out 
at a later time, x^: 

X(t) = I[xt^r/A, Zoj (3) 

over the rates of the biochemical reactions £ of the reg¬ 
ulatory network, where: 

= y: p(^.,.,) log ■ (4) 

and Po(zo) = Poi^o, Zo), P(xt) = P(xt, zq)- 
We will measure the time t = rjX between the signal 
and the delayed read-out in units of the natural timescale 
of the problem - the relaxation time A“^, calculated as 
the inverse of the minimal non-zero eigenvalue of £. Pre¬ 
viously we found the networks that are best suited for 
transmitting information at a delay and discovered that 
they correspond to systems that function out of equilib¬ 
rium. 

For this reason we are interested in posing the same 
question, but taking into account energy constraints. We 
thus constrain the energy Q dissipated per unit time into 
an external medium at temperature T that is in contact 
with our system. Q is related to the thermodynamic en¬ 
tropy production rate a, Q = ksTu, where ks is the 
Boltzmann constant [42l |55] . In steady state the ther¬ 
modynamic entropy production rate takes the form 

^ = log — ’ (5) 

. . 

where, in terms of the shorthand *, j = {xt, Zt) to denote 
the states, Wij is the transition rate from state i to j and 
P°° is the steady state probability distribution for state 
i. 

In order to intuitively understand the expression in 
Eq-i we link it to the non equilibrium properties of the 
system. In steady state the master equation satisfies 

P^w,, - P^Wj, = ±J (6) 

with the -I- (— ) sign that holds for all pairs of states where 
i follows j in the clockwise direction in Fig. and J is 
the steady state current. The detailed balance condition 


Prw,,-P^Wj, = 0 W,J, (7) 

is a special case of Eq. where J = 0. In terms of 
the current defined in Eq. the steady state entropy 
production rate in Eq. becomes 


a = J log 


U;i 2 W2i W43 W31 


( 8 ) 


(see Appendix for derivation). In order to maintain a 
non-equilibrium steady state {J ^ 0) the system has to 
dissipate energy at rate k^Ta. 

We are interested in solving the problem of finding the 
best network design that can perform a maximally infor¬ 
mative delayed readout given a limited and fixed amount 
of ksTa units of energy per unit time. This question can 
be addressed quantitatively by introducing a Lagrange 
multiplier I that constrains the energy cost of the trans¬ 
mitted information and maximizing the functional 


I{t) - I 


a 

A log 2 


= T{t) — la 


( 9 ) 


over the circuit’s reaction rates, £. We rescale the rate 
of energy dissipation, cr, by the constant A log 2 and call 
it (7, in order to express both information and entropy 
production in bits and to measure time in units of the 
characteristic timescale 1/A. 

Eor I = 0 the constraint on the dissipated energy does 
not enter the optimization and one recovers the results 
found without imposing energetic constraints (a = a = 
00 ) [38] . In this limit the system is driven out of equilib¬ 
rium and at least one of the rates vanishes. At the other 
extreme, when Z = 00 , any deviation from equilibrium is 
severely punished and we expect to find the system in 
equilibrium. 

Some intuition about the optimal solutions can be 
gained before embarking on detailed calculations. In 
general we can write the probability distribution as 
P{xt,ZQ) = {a + bxt + czq + p,a:tZo)/4. The symmetry 
between the on and off states, P{+,+) = P{—,—) and 
P{+, —) = P{—, -b), implies b = c = 0 and normalization 
= 1 gives a = 1. Therefore P{xt,zo) has to 
be of the form 


P{Xt,Zo) = 


1 


lixtzo 


( 10 ) 


Eq. 10 means P{xt) = P^Zq) = 1/2, which independently 
maximizes the entropy of the input, H[z 3 \, and output 
distribution, H[xt]j where P[[y\ = —P{y)\ogP{y). With 
this form for P{xt,Zo) the mutual information in Eq. 
becomes: 


J=^log(l + M) + ^^log(l-/i), (11) 


where in general |/i| < 1. The symmetry of the system 
results in a degeneracy of solutions, which we break by 
setting one of the input flipping rates to a fixed value 
r = 1. With this choice, the allowed range of y is [0,1], 
and information is a monotonically increasing function 
of the “effective magnetization” y, and is maximized for 
/r = 1 giving 7 = 1 bit. We compute y explicitly for 
specific models in the following sections. 


A. Simplest model 

Eirst we consider the simplest case depicted in Eig. 
where we set all the rates for flipping of the input 2 : to be 


^21 VJi3 W34 W42 
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equal Up = Um = dp = dm = u, but allow the rates for 
the output X to be different if the output is aligning with 
the input Vp = Vm = r oi it is anti-aligning Sp = Sm = s. 
This models allows z to activate (or repress) x with rate 
r(s), respectively, but does not allow for feedback since 
the flipping rate of the input does not depend on the 
state of the output. We diagonalize the rate matrix C 
for this model analytically and find the eigenvalues to be 
{0, 2u, 1 -I- s, 1 -|- s -I- 2u} (see Appendix C 1 for details). 
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FIG. 2: The four network states, with corresponding tran¬ 
sition rates, in the simplest case where input z can either up 
or down-regulate the output x but there is no feedback. The 
input 2 switches with the same rate regardless of the state of 
the output X. 


We can express the mutual information explicitly in 
the form given by EqjTTJ with 


,(l + s + 2 u)e 

- (1 + .) 2 - 4 „^ -■ 


where time is rescaled by the smallest nonzero eigenvalue 
A, as specified in Eq. 

The rescaled entropy production rate (Eq. becomes 


(1 - s)ulog^ 
A(1 -|- s “h 2zi) 


(13) 


Given that the smallest nonzero eigenvalue can be either 
A = 2 m or A = 1 -I- s, we can define the quantity 7 = 
(1 + s)/(2u) and distinguish two regimes: 7 > 1 in which 
the output changes on faster timescales than the input 
and 7 < 1 , where the input changes more quickly than 
the output. In general, for each set of rates, the two 
eigenvalues must be compared and the value of A (and 
thus 7 ) determined. 


1. Numerical results: 


To get an idea about the behavior of the system we 
will first solve the optimization problem numerically and 
then interpret the results in terms of the limiting cases. 
For each readout delay r and entropy production rate ct, 
we look for rates that maximize I{t) (given by Eqs. 
and 12 ) while constraining the rates at fixed a (given by 

Eq.ra. 


The maximal mutual information values (capacities) 
of the optimal networks display an intuitive behavior as 


functions of the dissipated energy and time delay of the 
readout. The mutual information between the input and 
output of the optimal network decreases with the time 
delay of the input readout for all values of dissipation 
(see Fig. [^a), as the network decorrelates. Allowing the 
system to dissipate more energy increases its capacity to 
transmit information. Above a certain value of dissipated 
energy the capacity plateaus and reaches the same value 
we observed if we did not constrain dissipation |38] . The 
value of the this plateau decreases with an i ncrease of 
the time delay r of the readout (see section IIA 5 for 
a functional dependence). The transmitted information 
decreases to zero linearly with dissipation for all readout 
delays, I* ^ ’ 210 !; 2 ’ "''^bere c(r) is a r dependent constant 
derived in section |II A 4| Naturally, the capacities for 
systems that can dissipate a lot of energy are much larger 
than those with large energy constraints. However at 
small time delays the rate of decay of the capacity with 
time delay is larger for circuits that function far out of 
equilibrium than those that are close to equilibrium (see 
section |II A3 1. 

In Figures 3] C and D we plot the values of the rate 
constants of the optimal networks that result in the ca¬ 
pacities plotted in Fig. [^a. We see that similarly to the 
capacity values the optimal rates are continuous. To gain 
a better idea about the network topologies that give op¬ 
timal networks we have used the rates to broadly classify 
the circuit topologies in the phase diagram in Fig. b 
with the topologies defined in Fig. e. In the limit of 
large dissipation we recover the results we obtained pre¬ 
viously [5B] : in the optimal circuit at large readout delays 
the flipping of the output is governed by an irreversible 
fast reaction with rate r fixed to 1 (the back reaction is 
forbidden s* = 0). The output follows the state of the 
input and the change in the input is described by a re¬ 
versible slower reaction with rate u* < r* (network A in 
Fig. i e). For shorter delays the flipping rate of the input 
decreases, causing the capacity to increase. As r —)• 0, 
M —>■ 0 and we obtain two separate subnetworks with a 
fixed input in which the output changes quickly to follow 
the input (model C in Fig.|^e). 

At large readout delays, the equilibrium solution at 
(T —> 0 is very similar to the non-equilibrium one, but now 
detailed balance must always be obeyed. The detailed 
balanced condition imposes that the output change is 
completely reversible and now s* ^ 0. At ct = 0 the 
forward and back reactions are completely balanced with 
s* = r (network B in Fig. [^e). Additionally, the input 
changes on the same very fast timescale m* « r = 1 , 
faster than for large a. Not surprisingly this essentially 
randomly flipping equilibrium circuit at large delays is 
not able to reliably transmit information, and X « 0 . 
For short time delays, similarly as in the large a limit, 
M* —>■ 0 and we obtain two subcircuits with the output 
flipping back and forth at the same rate s* = r, at ct = 0 
(network D in Fig. [^e). Allowing for small amounts of 
dissipation breaks detailed balance and decreases the rate 
of the output’s back reaction (s* < 1 ), so that the output 
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FIG. 3: (a) Contour plot of optimal mutual information X* as function of the readout delay r and entropy production rate a. 

(b) Phase diagram in the {(y,r) plane of the optimal network topologies A, B, C, D (sketched in panel (e)). (c) Contour plot 
of optimal rate u* as function of the readout delay r and entropy production rate a. (d) Contour plot of optimal parameter s* 
as function of the readout delay r and entropy production rate a. (e) Sketch of optimal network topologies A, B, C, D. 


is more likely to be in the same state as the input. 

In summary, network C that has a fixed input, which is 
followed by the output on fast timescales, is the most in¬ 
formative solution. The capacity of this system is reached 
at finite values of ct, and does not increase further as 
O' —>■ oo. This topology is optimal for a wide range of a, 
with the back reaction rate s continuously increasing as 
the constraints on dissipation impose solutions closer to 
equilibrium, until network D with the randomly flipping 
outputs is reached. At small time delays the optimal so¬ 
lution always keeps the input fixed and adjusts the state 
of the output to the input (2u < r). But for large r 
the input will change {2u ~ r) and the amount of en¬ 
ergy that can be dissipated controls whether the output 
simply follows the input (network A in Fig. [^e), or is 
forced to switch independently (network B in Fig. [^e). 
Information can therefore be lost both in circuits where 


the output does not have the energy to follow the input 
(network D) and in circuits where the input decorrelates 
with time (network A), or both of these scenarios apply 
(network B). 

Lastly, one can interpret the optimal circuits in terms 
of the relaxation rate of the system (smallest nonzero 
eigenvalue). The ratio of the two potentially smallest 
eigenvalues 7 is given by (1 -I- s)/{2u) - the ratio of the 
output and the input switching rates. Fig. shows the 
optimal value of 7 *, as a function of the delay r, in the 
limit of small entropy production (d = 0.0007 bits) and of 
large entropy production (d = 7 bits). As noted before, 
for small time delays optimal circuits are those where 
the input changes more slowly than the output ( 7 * > I), 
for all values of dissipation. However for large t, we 
define a certain value Tc, at which the input and the 
output timescales match in optimal circuits, with 7 * = 1 . 
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FIG. 4: The quantity 7 = (1 + s)/(2u), is the ratio of two 
smallest nonzero eigenvalues, corresponding to the output and 
the input timescale, respectively. The optimal value 7 * is 
shown as a function of delay r, in the two limits of very small 
and very large entropy production a (measured in bits). The 
time Tc after which the output timescale matches the input 
timescale ( 7 * = 1) decreases with dissipation from 1 + \/3/2 
when d —>■ 00 to 1 + when d —>■ 0 . 


The value of r,. corresponds to the optimal rate of input 
flipping u* reaching a constant value and depends on the 
rate of dissipa tion . For d ^ 1, Tc = (1 + •\/5)/4 and u* ~ 
1 (see section C 3 for a derivation). For large dissipation 
rates a 1, this delay increases, Tc = (1 + •\/3)/2, and 
the rat e of cha nge of the input decreases to u* =0.5 (see 
section IIA5). At large delays r, matching the input 


and output switching rates allows the system to transmit 
more information. This matching of timescales is possible 
at r > Tc > 0 even if the system cannot dissipate energy 
(small a). Finally, the optimal solution always is in the 
7 * > 1 limit, where the input changes more slowly than 
the output. 

Having understood the general behavior of the capac¬ 
ity of this model, we can exploit its simplicity to obtain 
precise analytical scaling results in the limits of small and 
large delays and dissipation. 


2. Limit r = 0 


The simplest case is that of instantaneous readout, r = 
0 , where the effective magnetization ^ is 


M = 


1 — s 


7 


_ 1 -I- s/ 7 -I- 1 
We can formally rewrite Eq. [13] as: 


(14) 


As we know from our numerical exploration, in the 
T = 0 limit the capacity strongly depends on the value 
of a. First we can explore the limit of large dissipation, 
where we know from our previous work (and from the 
results presented in Fi g. that s* is small. In this limit 
Eq. 15 simplifies (Eq. 13 is explicitly solved for s) and 
/I is a function of only u and A, not of a. To find 7 * 
that maximizes /r, we exploit the parametrization of s in 
Eq. to write 


/i = tanh (/3(u, A)i 7 ) 


1 -f 7 


(16) 


At fixed but large a the largest value of /i is always 
achieved for 7 * = 00 . This means the output changes on 
faster timescales than the input and the smallest eigen¬ 
value is A = 2u. More precisely, as in the dissipation-less 
case [35], the optimal rate is u* = 0 at r = 0 . 

To find s*, we substitute the parametrization in Eq. 15 


for s into Eq. 13 with A = 2u and obtain at fixed a: 


1 + —=I3 tanh(/3(T). 
7 


(17) 


Since 7 * = 00 , / 3 * must satisfy /?* tanh(,0*(j) = 1. For 
large dissipation rates, ~ 1 , s* ~ is exponentially 
small as we had assumed and /r* ^ tanh((j) ~ 1. This 
results in the optimal information I* ~ 1 bit. 

For small dissipation rates, the general expression in 
Eq. [I^ h olds, where /3 is a nonlinear function of d, /3 ((t). 
Eqs. |16| and [TT] and the arguments presented above still 
hold, resulting in a maximum /i(T = 0 ) when 7 * = 00 and 
u* = 0 . Eor small dissipation_rates and_ taking 7 * = 00 , 

results in the 


16 


Eq. 17 becomes /3 *((t) ^ l/Vd, and Eq. 
effective magnetization /i ~ /3*((t)i7 ^ vd. Finally, the 
optimal mutual information goes to 0 linearly with the 
rescaled dissipation I* « (/r*)^/2 ~ d/2 bits. 


3. Limit r <C 1 


The results from the t = 0 limit serve as a basis for con¬ 
sidering the scaling of the mutual information for small, 
but finite r ^ 1. Since 7 * diverges at r = 0, we as¬ 
sume that for r —0 the smallest eigenvalue is still 2u 
and 7 * > 1. We will also use the generalized nonlinear 
parametrization of s in Eq. 15 with also in the small 


dissipation regime, as we did for r = 0 . 

In the small dissipation limit d <C 1, Eq. [TT] becomes: 




1 /T+T 
VdV 7 


(18) 


Using Eq. the effective magnetization at fixed d is a 
function of only 7 and d 


s = exp[- 2 / 3 (d, 7 )d], (15) 

where / 3 (d, 7 ) is in general a nonlinear function d and 7 . 
This form agrees with the numerical results for s* that 
shows a strong decay with d (Fig. |^. 


+ ( 19 ) 

We maximize the effective magnetization with respect to 
7 , dti/d"f = 0, and assume the scaling 7 * ~ ^ -I- 60 -I- 























(a) 


(b) 




FIG. 5: Comparison of the analytical (dashed lines) and numerical solutions (solid lines) for optimal mutual information I*. In 
panel (a) the behavior for small values of entropy production a is shown for r = 0.1 and for r = 1 as presented in sectio n|II A4 
In panel (b) the dependence on r is represented for d = 0.29 bits and d = 7.21 bits as presented in sections |IIA4| and [IlT5r 


cqt . Solving the resulting e quat ions for the coefficients 
in orders of t (see Appendix |C 2 | for details), the optimal 
effective magnetization is 


M*-V^(l + 4lor), (20) 

where Aq = —0.24... is computed exactly in Ap¬ 
pendix |C 2| 

In the large dissipation limit ct 3> 1, Eq. [T7| becomes 


/?cs 


7 -f 1 
7 


( 21 ) 


Using Eq.[l^ X = 2u and the fact that in this limit s —>■ 0, 
the effective magnetization in Eq. is 


M=:^[(7 + l)e-"-2e-'^n- (22) 


Assuming 7 * ~ ^+boo+CooT, the analogous calculation 
to the small dissipation limit results in the maximized 
effective magnetization 




1 + AoqT -f BoqT^ 


(23) 


where A^o = —0.63... and Boo = 0.23... are computed 
exactly in Appendix |C 2| 

Summarizing, in the small dissipation limit we find 
I* « (/r*)^/2 ^ d-(l -I- 2 Aqt )/2 bits, a linear scaling of 
the information both with dissipation and with readout 
delay. In the large dissipation limit the information is in¬ 
dependent of the dissipation and X* —>■ 1 bit, as tends 
to one quadratically in the delay, as given by Eq. 2A 
This scaling behavior is compared with numerical results 
in Fig. 


4- Limit (t <C 1 


In the regime where A = 2?x, this behavior is clear. If 
A = 1 -I- s, small a could also be obtained setting u ~ 0, 
yet this would mean that 2u < 1 -I- s, and A = 2u. So the 
only consistent solution demands s —t I. 

We set s = 1 — e and expand Eq. [T^ to leading order 
in e 




e (1-b 7 )e-^ - 2e-'>'^ 


and similarly Eq. 13 


a ~ 


72 — 1 




4(1 4 - 7 ) log 2’ 


(24) 


(25) 


Eliminating e from Equations and reads /i ~ 
c( 7 ,T)-\/d in the small dissipation regime. To derive the 
proportionality coefficient 0 ( 7 , r) we solve Eq. 25 for e 


e ~ 2^/(7^-^*^ log 2, 


(26) 


and use Eq. to find 


c( 7 ,r) = 


7 log 2 — 2 e -b ( 1 -b 7 )e 

7 -b 1 7 — 1 


(27) 


For each value of r the function 0 ( 7 , t), has a single 
maximum in 7 *, which is a decreasing function of r and 
satisfies the transcendental equation in Eq. |C10[ In the 
7 —>■ l"^ limit, the maximum of 0 ( 7 , r) reaches 7 * = 1 at 


Tc = 


1 +V5 


(28) 


(see Appendix C3 for details of the derivation). For all 
larger values of r > Tc, 7 * = 1 and 


The scaling at small dissipation with a for all r is ob¬ 
tained by noting from Eq. |13| that in this limit s —> 1. 


c(r) = c (7 = l,r) 


e-^(l-b2r)VIbp 

V2 


(29) 
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The optimal mutual information I* is linear in dissipa¬ 
tion and exponentially decaying in t 


c(r)^o- 
“ 21 og 2 ■ 


(30) 


The comparison with the numerical result is shown in 
Fig. [5] 


5. Limit a ^ 1 

In the large dissipation limit Eq. is satisfied only 
if s —>■ 0 with u bounded by u < 0.5, regardless of the 
initial assumptions of about A. The optimal solution is 
thus in the 7 > 1 regime and we can extend the observa¬ 
tions from the small r limit to postulate that the effective 
magnetization is weakly dependent on the entropy pro¬ 
duction 

f, c+(7,T) = ;^[-2e-'^" + (7 + l)e-n. (31) 

The effective magnetization /r* ~ c(r) has a single max¬ 
imum in 7 for each r that is a decreasing function of r 
and satisfies the equation 


without additional inputs the difference between the in¬ 
put and output is no longer clear: z is an input for x and 
vice versa. 

We can exploit the symmetry between + and — states 
to decrease the number of rates in the network and set the 
rates of aligning (and antialigning) of the output to the 
input to be equal, regardless of the state of the input. 
Specifically, the rates defined in Fig. simplify to the 
ones shown in Fig. e. g. Vp = Vm = r = 1, Sp = 

Sm = ^^15 = 0 ^ 1, djji — UfTi = y ^ 1. 

We know from our work in the infinite dissipation limit 
[38] , that the optimal solutions cycle irreversibly through 
the four states. The symmetry between + and — states 
corresponds to a degeneracy between cycling clockwise 
and counterclockwise and by picking this parametrization 
of the network we are restricting ourselves to clockwise 
cycles without any loss of generality. 


(x,z) (- -) 

(output,input) g 

T 

(+. -) 

3 


a 


y 

y 


2 

(-.+) 
A 

s r 

(+,+) 

4 


( 7 - i)t ^ 2(1 -H 7^ -H 7 t ( 7 ^ - 1)) 

(1+7)^ 


(32) 


FIG. 6 : The four network states, with corresponding transi¬ 
tion rates, in a model with feedback where the input 2 rates 
depend on the state of the output variable x. 


Similar considerations as in the small dissipation case 
result in 


i + Vs 

“ 2 

above which the optimal 7 * = 1. In this limit 


(33) 


(34) 


Finally, the optimal mutual information approaches a 
plateau for large a given by 


T ~ 


l-|-rdlog 2 (Ta/e), 7 *»l,r<l 

cjrf 1 _ 1 (35) 


2 log 2 ’ 


7 * = 1, r » 1 


and is compared with the numerical result in Fig. (d = 
0.31... defined in Appendix |C 4[ ) . 


B. Feedback 

We now ask how allowing for feedback between the 
output and the input changes the energetic constraints 
on the optimally informative solutions. In terms of our 
model, this corresponds to saying that the input switch¬ 
ing rates depend on the state of the output (up yf Um 
and dp yf dm), unlike in the simplest case of the model 
discussed in section III Al In circuits with feedback and 


In terms of the rates defined in Fig. the eigenvalues 
of C (see Appendix I D l| for details) are {A^} = {0, A, {A — 
P)/2,(^ + /o)/2}, where 

A = l-bs-fy-l-a, (36) 

p = \/(l -I- s -I- y -I- 0)2 — 8{sy + a). (37) 

The smallest nonzero eigenvalue is always A = {A — p)j2 
and the steady-state probability distribution is given by 
the normalized right eigenvector of the null eigenvalue 

= ^{1 + y, s + a, s -I- a, 1 -I- y}. (38) 

The entropy production after rescaling by the smallest 
eigenvalue reads 


(T = 


2 (a - sy) 
A{A-p) 


l 0 g 2 



(39) 


and the mutual information is expressed by Eq{TT] in 
terms of the effective magnetization 


[s^ — (1 + y)^ — 4 q: -|- -I- 2s(2y -|- a)] 


Ap 


sinh al 


im 


(40) 
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with q = (l + y — s — a)/A and time rescaled by the 
smallest nonzero eigenvalue A, as explained in Eq. (see 
Appendix D 2 for a detailed calculation of y). 

The nonlinearities of the problem prohibit finding an¬ 
alytical solutions to this constrained optimization prob¬ 
lem, but we explore some limiting case behavior before 
we turn to the full numerical optimization of the problem. 


1. Limit a — 0 


The completely equilibrium limit of tt = 0 simplifies 
the constraint on the rates in Eq. to 

a = sy, (41) 

which simplifies the effective magnetization in Eq. [40|to 

+ ( 42 ) 

We can reparametrize the rates as 


w = 


4s 

(l+s)2’ 


Ay 


to rewrite 


1 -|- 1/1 — wv 
1 — — wv 

(43) 


M = 




w 




wv 


1 + 




w 


'n- 

(44) 


At r = 0, the effective magnetization is y = y/1 — w = 
and is maximized for s* = 0 (which implies a* = 
s*y = 0). At r > 0, the optimal effective magnetization 
is also obtained at ic* = 0 (or s* = 0 and a* = 0 and y 
is not constrained) and y* = e~'^. The optimal mutual 
information I* is 1 bit for r = 0 and decays in time as 


I* 


1 

2 



_e-2-) + e-nog2 


1-f e-^A 
1 - e-^y 


bits. (45) 


This solution corresponds to a circuit where the two 
“mixed” states {x,z) = {(+,—),(—,+)} are not accessi¬ 
ble [p+- = P-,+ = 0)1 while the two “aligned” states 
(-|-,-l-) and (—,—) have probability 1/2 (see Eq. 38|. 

This optimal solution describes a completely unre¬ 
sponsive network with no local fluxes. The values of 
the nonzero rates y and r are irrelevant, since they ac¬ 
count for switching from completely forbidden states. 
The highest possible value of information transmission 
is guaranteed, while remaining in an equilibrium config¬ 
uration in which detailed balance is satisfied, but there 
is no regulation. If the readout occurs at later times, the 
transmitted information decays, however the nature of 
the solution remains the same. In summary, the optimal 
solution in equilibrium corresponds to a static “dead” 
system, which is very informative since the two aligned 
states are on average equally sampled, but not necessary 
useful because the timescales for flipping between them 
are infinite. 



FIG. 7: An energetic representation of the suboptimal net¬ 
work in perfect eqnilibrium for the model with feedback in 
Fig.0 Here we show the limiting case where entropy prodnc- 
tion (T = 0 and transition rates are related by the condition 
a = sy, which results in i?(-|-, —) = E{—,+) > i?(-|-,-h) = 


The suboptimal solution at d = 0 differs from the op¬ 
timal unresponsive network in that it has small back re¬ 
action rates for the flipping of the output and input from 
the antialigned to the aligned states, s = ij and a = yy 
(with T] 1). Since these reactions have non-zero rates, 
the “mixed” states have probability ?7/[(l + y)(l + ??)] 
and the system is able to cycle through the four states 
and transmit almost 1 bit of information, without dissi¬ 
pating energy. 

We can interpret these suboptimal solutions in terms of 
the energetic barriers in the system. We use the detailed- 
balance condition in Eq. and the Boltzmann relation. 
Pi = exp {—Ei/ksT), between the probability Pi and 
the energy Ei of state i to express the rates in terms 
of the energies of the states and obtain the condition 
E{+,-) = E{-,+) > E{+,+) = E{-,-), depicted in 
Fig. [3 

As long as the mixed states have a finite energy the 
system is able to cycle indefinitely through all the states 
at no cost. At s = 0 (which implies a = 0 from the equi¬ 
librium condition) infinite energy barriers separate the 
aligned states and lead to the unresponsive dead solution. 
When s > 0, the input z controls output x, transmitting 
information. This suboptimal costless yet informative so¬ 
lution is possible only because of feedback. In the simpler 
model of section m the only dissipationless solution is 
when all rates are equal {a = y = s = r = l forcing I to 
be zero). 


2. Limit a <C 1 


Using the intuition from the cr = 0 limit, for a nonzero 
but very small we expand a around the dissipationless 
solution, as a = S 2 /(l — e) with e ^ 1. 

For clarity, we consider the r = 0 case where 


M = 


1 — s y — a 
I -I- s -I- ?/ -I- a 


(46) 


The rescaled dissipation, Eq. 39 in terms of e and the 
parametrizations in Eq. |43[ is 


wve^ 

8(1 — y/wv) log 2’ 


(47) 
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Keeping only the leading order term in e, the e > 0 solu¬ 
tion of Eq. 1^ is 


„ ^ 1 — \/l ~ wv , 

e = 2\l2a log 2. 


wv 


(48) 


Expanding the effective magnetization in Eq. to first 
order in e in terms of w and v we find 


. ~ \/l — w -I- \/a log 2 


wv{l — Vl ~ 

72(1 -h VT^) 


(49) 


The effective magnetization ^ at fixed a is optimized at 
small but nonzero u; = e « 0"^ (which translates into 
small but nonzero s = e « O’*') and u = 0 (which sets 
y = 1). Since the effective magnetization is bounded by 
1, the value of w cannot be equal t o z ero for ij > 0. 

to — w « 1 


49 


These values set the first term in Eq. 
and maximizes the coefficient of Td, resulting in « 1 
and consequently X* « 1 bit. 

Unlike in the model without feedback, it is possible 
to achieve almost 1 bit of information even for arbitrar¬ 
ily small entropy production. Since all rates are larger 
than 0, the network features all the four states, although 
the system spends most of its time in the aligned states 
(-I-, -b) and (—, —). The nature of this solution is quanti¬ 
tatively different than the optimal unresponsive ’’dead” 
network at ct = 0, showing that even a small amount 
of dissipation makes the system responsive. The opti¬ 
mal solution at (T <C 1 is also the suboptimal solution at 

(7 = 0 . 


3. Limit a ^ 1 


When entropy production is very large, we see from 
Eq. that a diverges with s* = y* =0. We also 
know from previous work |38] that mutual information 
is maximized for all delays t for these values. To find 
a *, we consider the effective magnetization fj, in the limit 
s = y = 0: 


y{a,T) = 


2(1 -b a)p 


-X (1+c+p)^ 
l-c+p 


^i+c-p 


^ '' .= -l + °+P (l + c.-p)2 \ 

-1+a+p 

(50) 

where p = 7l + — 6a. Expanding p to the first order 

in r 


, . 1 — a (l-ba-bp)r 

MT<i(a,T) ~ —-b , , . 

1 -b a 2(1 -b a) 

we find a* that maximizes the above expression 

(l-r)T 


a*(r) = 


2-r 


(51) 


(52) 


a* is an increasing function of r, until it reaches the value 
a* = 3 — 272 when Tc = 2 — v^. For such value of a*, 
p = 0 and the two smallest eigenvalues {A — p)/2 and 


{A + p )/2 become degenerate. Values of a larger than a* 
are not optimal, since then p would becomes complex and 
oscillations would be detrimental for information trans¬ 
mission [55] . 


4-. Numerical results 

To generalize the above results to all values of a and 
T we numerically optimize the information constraining 
the rescaled dissipation. As in the circuits without feed¬ 
back, the maximum information the circuit is able to 
transmit decreases with the time delay of the readout for 
all values of a, as the system decorrelates with time (see 
Fig. a). At small but finite dissipation the decrease is 
exponential in t and at large readout delays the system 
has similar characteristics as the circuit with no feed¬ 
back: the optimal network consists of reversible flipping 
of both the input and output with large rates (network 
Bf in Fig. I^b). These networks are not useful for trans¬ 
mitting information, but given the constraints of large 
time delay and close to equilibrium solution, better so¬ 
lutions cannot be found. As described in section [II B 1[ 
at (7 = 0 the optimal solution has the input and output 
permanently fixed in the same state, providing perfect 
readout but not functioning as a switch. This solution is 
obtained with infinitely high energy barriers between the 
two aligned states giving infinite switching rates between 
these two minima. Decreasing these energy barriers at 
small but finite dissipation (or for non-optimal solutions 
at (7 = 0 ), results in a finite lifetime of the two aligned 
states, effectively producing a stable switch with very 
long lived states (network Df in Fig. [^. These optimal 
networks at small but finite dissipation transmit close to 
1 bit of information, also at small but finite time delays. 
Feedback allows for a switching rate of the input that de¬ 
pends on the output and optimal circuits have fast rates 
for the output and input to align, and slow rates to anti¬ 
align, resulting in larger probabilities that the system 
is in the aligned states at the time of the readout and 
measurement. At large dissipation and small readout de¬ 
lays, we recover the same solution as in circuits without 
feedback. The input 2 : does not change and the output 
quickly aligns with the output (network C/ in Fig. ^ . 
As the readout time increases, the input state switches 
and the system decorrelates causing the transmitted in¬ 
formation to decrease. The large dissipation rate allows 
the system to avoid the equilibrium solution of network 
Bf in Fig. but cycle through the states with an alter¬ 
nating cornbination of fast (r that aligns the input and 
output) and slow [a that anti-aligns them) rates (net¬ 
work A/ in Fig. 1^. As a result the circuit is more likely 
to be found in the aligned states at all times, transmitting 
more information. As discussed above and in our previ¬ 
ous work |38|, the optimal network topology for large 
delays is a negative feedback loop, which is known to os¬ 
cillate in certain parameter regimes |56j . Since oscillatory 
solutions would decrease the information transmitted at 
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(a) X* (bits) (b) 



012345 012345 

a (bits) (7 (bits) 


FIG. 8: (a) Contour plot of optimal mutual information X* as function of the readout delay r and entropy production rate a, 

in the presence of feedback. In contrast with the simpler model of Fig. mutual information is now equal to ~ 1 bit for any 
value of a when r <C 1. (b-d) Contour plots of optimal rates s* (b), a* (c) and y* (d) as functions of the readout delay t and 
entropy production rate a, in the presence of feedback. 
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FIG. 9: (a) Phase diagram in the (it,t) plane of optimal rates s*,a*,y* and optimal network topologies Af, Bf, Cf, Df, Ef, in 

the presence of feedback. The optimal network topologies are sketched in panel (b). The gray lines in network Df of panel (b) 
denote the back reactions with small rates. 


large delays, avoiding the oscillatory regime sets a limit 
to the maximum value of a. As dissipation decreases 
in the large r limit, the rate of aligning 2 ; (a) decreases 
(Fig. § c), without having a large effect on the trans¬ 
mitted information (network Ay in Fig. b). Only at 


(T < 1 when the rates antialigning of the input and out¬ 
put increase (Fig.j^b and c), the transmitted information 
decreases. 

The gain in the transmitted information per dissipa¬ 
tion rate goes to zero at large a values, when the full 
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FIG. 10: Comparison of the mutual information for the optimal and suboptimal solutions, (a-c) Optimal mutual information 
I* as a function of the entropy production rate a for different readout delays r and (d-f) as function of the readout delay r 
for different values of the entropy production rate a (expressed in bits). Results from the simulation branch with y > a and 
with y < a are shown as dotted red lines and solid cyan lines, respectively. Rates used to compute such mutual information 
are shown in Fig. |14| and Fig. of the section The solutions of the two branches y > a and y < a coincide at large a and 
small r and at small a and large r. This happens because the back and forward input flipping rates are equal: in the first case 
j/* « a* « 0, while in the second case y* ~ a* > 0 (see Fig. j^a). 


non-equilibrium solution is reached, as could be expected. 
However also at small dissipation rates there is no in¬ 
crease in the transmitted information as the system dissi¬ 
pates more entropy. In this regime the switching rates for 
the input z strongly favor the aligned states {y* > a*), 
making the transition to the anti-aligned states very un¬ 
likely. The optimal motif is a positive feedback loop. 
As a increases the energy barriers between the aligned 
and anti-aligned states decrease, since y decreases, but 
the qualitative nature of the solution does not change. 
Only when the rate that favors cycling through the four 
states a increases does the transmitted information go 
up (and the nature of the network changes from Df to 
Ay in Fig. |^b). In this region, for intermediate values of 
T, the gain in transmitted information per increase in a 
is the largest. 

Lastly, we compare the information transmitted by the 
optimal networks to that transmitted by suboptimal net¬ 


works for characteristic values of t and a (Fig. 10). We 
define the suboptimal networks by dividing the optimiza¬ 
tion procedure into two branches: in one branch we con¬ 
strain y > a, in the other branch y < a. In this way we 
explore two different topologies. In the first one the sys¬ 
tem concentrates on the aligned states (-I-, -b) and (—, —) 
(like network Df in Fig.|^b), while in the second one the 
system cycles through the four states in the clockwise di¬ 
rection (like network Ef in Fig. [^b). From Fig. 10 we 
learn that for d —>■ oo the optimal topology is a clock¬ 
wise cycle where the system is able to transmit 1 bit of 
information |38j . However, when moving towards finite 
values of d, information transmission decreases, until a 
point where the system is confronted with a choice: either 
continue to cycle inefficiently with strong back reactions 
and reduce X, or to concentrate the probability distri¬ 
bution on the aligned states and reach a finite plateau 
X = Ict=o (Fig. I^b). In certain cases (large dissipation 
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and small and intermediate delays - Fig. [^a and c) the 
two branches coincide and give the same network topolo¬ 
gies with s* = 0 and small input flipping rates {y* « a*). 


III. ROBUST OPTIMIZATION 

In many situations a biochemical circuit needs to re¬ 
liably respond in many possible external conditions. In 
this case, optimization in the typical environment, as the 
one discussed in earlier sections, is not the desired cri- 
terium. Such a situation is better described by assuming 
that the environment chooses the worst possible condi¬ 
tions for the network to function. Formally this is cap¬ 
tured by assuming that the system and the environment 
play a zero-sum game, where the circuit is trying to max¬ 
imize the mutual information between the input and the 
output, while the environment is trying to minimize it. 

A game theoretic formulation of the problem requires 
one to define the strategy space, which in this case 
amounts to deciding which variables are controlled by the 
circuit and which by the environment control. Here we 
assume that the system will adjust the transition rates, 
whereas the environment controls the initial probability 
distribution p{xo, zq) of the input z and output x. 

In other words, we are interested in circuits that are 
optimal for working in the worst possible environmental 
conditions, which in game theoretic terms correspond to 
maximin or “minorant” strategies m- the player has the 
goal of maximizing a function, whereas the opponent has 
the goal of minimizing it. This strategy is also related to 
“robust control” [S31 [S3] . In our case the circuit behaves 
so as to ensure that at least a certain number I of bits 
are transmitted over a given time-scale. 

We look for the networks that are best adapted to the 
worst case scenario for the simplest circuit without feed¬ 
back presented in section jll Aj in the infinite dissipation 
limit. We recall that in this case the input z flips between 
the -I- and the — state with rate u, and the output re¬ 
sponds to the input with rate r (see Fig{n]). Since in the 
infinite dissipation limit the most informative solutions 
always forbid the anti aligning of the input and output 
(s = 0), for simplicity we consider only circuits with this 
constraint. 


(x,z) 

(output,input) 


(-,-) 4 -(-.+) 


u 


(+, -) 4 -(+,+) 


u 


FIG. 11: The four network states, with corresponding tran¬ 
sition rates, considered in the maximin optimization where 
the input z can either up or down-regulate the output x. x 
aligns with z with rate r. 


We consider this problem on the timescales of the sys¬ 


tem, which means that the system wants to maximize 
the mutual information I(t) between the input at time 
0 and the output at a time r = fA, where A = min(r, 2u) 
is the minimal non-zero eigenvalue of the transition rate 
matrix - the inverse of the system’s slowest timescale. 
As in section HA] we set r = 1 to set the units of time. 
The effective magnetization in Eq. derived based on 
the quantities in Appendix |E| with s set to zero is: 

/i = . (53) 


where /r > 0, and |/io| < 1 encodes the initial condition 

1 + XoZoPo 


P{xo,zo) = 


(54) 


Unlike in the cases when we optimized the transmitted in¬ 
formation between the input and output for circuits that 


are in steady state in sections IIA and IIB in the setup 
considered here the initial distribution does not need to 
be in steady state. The space of solutions considered here 
is the same as the one we considered previously [35| , when 
we optimized the information transmitted with a delay 
in circuits that were out of steady state. There we simul¬ 
taneously found the optimal initial distribution and the 
parameters of the circuit. Here, we vary the same proper¬ 
ties of the system (initial distribution and flipping rates), 
but with a different underlying optimization criterium - 
the environment minimizes the transmitted information 
by setting the initial distribution and the circuit sets the 
flipping rates. 

Maximizing the information transmitted in the worst 
case scenario in terms of this model takes the form: 


• The environment E chooses fiQ so as to minimize 
mutual information, given the rates of the circuit. 
This corresponds to finding the value of /Tq which 
makes p as small as possible (since I is an increas¬ 
ing function of y, in the allowed y > 1 regime). 

• Given fiQ, the circuit S looks for the rate u that 
maximizes I (i.e. y). 

The above zero-sum game between the system (circuit) 
S and environment E is formalized in terms of their re¬ 
spective cost functions Es and Ee that satisfy 

Es+Ee = 0, (55) 

where Es = —Ee = \y\ = E{yo,u]T). The optimization 
problem becomes 

maxmin J^(/ro, u; r). (56) 

U flo 

The optimal /Xq chosen by the environment is a function 
of u and r, such that 

min J'(/xo,m;t) = E{yl{T,u),u]T), (57) 

Mo 

and the circuit chooses u* = u*{t) that satisfies 
max E (/Xn (r, xx), ?x; r) = Eiy* (r, ?x*), xx*, r). 

U 


(58) 
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To make analytical progress we have to separately con¬ 
sider the regimes of the two possible smallest eigenvalues 
A = min(l, 2u) 


A. Case A < 1 

In the regime where X = 2u < 1, the input switches 
on slower timescales then the output and the effective 
magnetization in Eq. is 


= /roe -f 


( 

l-2u\ 


g-T _ g-r/2« 


)■ (59) 


The best strategy for the environment E, would be to 
choose /ig such that /r = 0. However it is constrained to 
fulfill — 1 < /Tg < 1. Minimizing Eq. [^with respect to u 
subject to the constraint on /xg results in: 


Mo = 


l-2u ■> '^ < 

-1, T > Tc{u), 


with 


Tciu) = 


Au 


l-2u 


log(l - u). 


(60) 


(61) 


When T < Tc{u), the environment is able to set /r and 
thus X to zero. However, when r > Tc{u), the magneti¬ 
zation is 

/x(-l, u; t) = -6-^/2“ + (e-^ - , (62) 

where u is constrained to be in the interval 
[0, min(l/2, Uc(t))] and Uc{t) is obtained by inverting 
Eq. 61 Given these forms of iiq{u,t) the circuit tries 
to maximize the information by tuning u at each value 
of r. In the r > Tc{u) regime the effective magnetization 
is maximized by a u* that solves 


a/x(-l,u; r) 
du ' 


= 0 . 


(63) 


Generally Eq. 63 needs to be solved numerically, but in 
the limit r <C 1 we find that when r —>■ 0, u* = I , —, —)■ 
0 sublinearly (see section E 2 for details of the derivation) 


2(-logr-tlog(2(logT)2) -2^2M?fel)!) 


U ~ -r-rTXTT-TVv?^ , T <C 1. 

I 

(64) 

The above solution u* of Eq. 63 is valid as long as the 
smallest eigenvalue A = 2u < 1. This choice of A con¬ 
strains u* < 1/2, which also constrains t < t*. Setting 


which is fulfilled by r* = 4. 

In summary, the environment E chooses /Xg so as to 
have = 0. However, this is possible only for 

T < Tc{u). In this regime the transmitted information 
is always zero and there is nothing the circuit can do 
against the judicious choice of the environment. For 
T > Tc{u) the best thing the environment E can do is 
to set /Xg = —1. In order to counteract the strategy of 
the environment E, at each readout delay t the circuit 
S chooses u < Uc(t) (with Uc(t) obtained by inverting 
Eq. 611, such that the environment E is forced into the 
regime where the best it can do is /Xg = — 1. In this 
regime, the circuit S maximizes the function /x(— 1,?x;t) 
in XX G [0, min(l/2, xxc(r))] and finds u* = 2 (a*+r) ’ '"^here 
a* is given by the solution of Eq. [E7| The maximum value 
of the dipping rate for the input xx*(t) = 1/2 corresponds 
to the readout delay r* = 4 and marks the transition to 
the regime with A = r = 1. The effective magnetization 
/X* at the transition is S/e"* « 0.05 and hence X* « 0.002. 


B. Case A = 1 

For T > 4, the smallest eigenvalue is A = r = 1, the 
input switches on faster timescales than the output and 
the effective magnetization in Eq. is 

M = Moe-" + (e-2“" - e"") . (67) 


1 - 2xx 


The environment E chooses /Xg to simultaneously set /x = 
0 and fulfill —1 < /Xg < 1, which gives 


p-t(2u-1) 


Mg = 


- 1 , 


, r < Tc(xx), 
r > Tc(xx), 


with 


Tciu) = 


1 


2xx- 1 


log 


1 


2(1 - xx) 


( 68 ) 


(69) 


If the system S wants to be in the regime r > rc(xx) 
where /Xg = —1, then the circuit must choose a rate xx* G 
[1/2 ,xXc(t)], with xXc(t) obtained by inverting Eq. 
This choice results in the effective magnetization 


69 


m(-1,xx;t) = 


2(l-xx)e-^-e 
2xx- 1 


— 2ut 


(70) 


For any r > 4, the effective magnetization in Eq. is 
always maximum at the border xx* = 1/2. 

In summary, for t > 4, the optimal response of the 
circuit is to set xx* = 1/2, forcing the environment into 
the T > Tc regime where the transmitted information is 
larger than zero. 


9m(-i,m ;t~)| 

du 


lti*=l/2 


= 0 , 


we get the condition for r* = r(xx* = 1/2) 


^T*iT*-A)e-^ = 0 , 


(65) C. Robust Optimization Solutions 

In Fig. we compare the capacities and optimal input 
switching rates at fixed readout delay r obtained for cir- 

(66) cuits optimized given fixed best (broken red line - results 
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from model A in previous work |38j i and worst (solid blue 
line - the maximin strategy discussed in this section) ini¬ 
tial conditions to the results of simply optimizing infor¬ 
mation given the system is in steady state in the infinite 
dissipation regime presented in section IIA (dotted black 
line). In the first case the environment first fixes the ini¬ 
tial probability distribution that is most limiting (blue 
line) or most favorable (red line) for information trans¬ 
mission and the circuit then finds the switching rates 
that allow it to transmit the most information, possibly 
neutralizing the harm of the environment. In the sec¬ 
ond case, the initial probability distribution is fixed at 
steady state and the circuit optimizes its switching rates 
within this constraint. We find that Tc in Eq.|^is always 
zero, such that the worst initial condition always corre¬ 
sponds to /To = —1 for all r and the initial probability 
distribution is evenly divided between the mixed states 
{(—,-|-), (-I-, —)}, such that po(i) = 1/2. The best initial 
condition has /io = -|-1 and the initial probability distri¬ 
bution po(0 = 1/2 for the aligned states {(-I-,-I-), (—,—)}. 
In the latter case m* = 1/2 for all readout delays, and the 
circuit functions in a regime where the input timescale 
2u and the output timescale r = 1 always match. If the 
initial distribution is the steady state, u* is equal to 0 
at T = 0 and increases with r until reaching the plateau 
u* = 1/2 for T = {1 + •\/3)/2. If the environment sets 
the initial distribution to be the worst possible for in¬ 
formation transmission by the circuit, u* = 0 for r = 0 
and increases much more slowly in r than in the steady 
state circuit, finally converging to u* = 1/2 for r = 4. 
When the circuit controls the choice of the initial state, it 
maximizes the probability of being in the aligned states, 
so that output X matches input z and the timescales of 
their switching are equal. However, when the environ¬ 
ment chooses the worst initial state, forcing the initial 
probability distribution to be in the mixed states, the 
circuit requires the output x to react as fast as possible 
to the input z {r ^ 2u) to align them. Despite these dif¬ 
ferences, in all cases the optimal network takes the form 
of a the same universal network (see Fig. 12 c). 

The steady state I* lies in between the optimal infor¬ 
mation in the maximin case {po = —1), which we will call 
and the one where the prior is optimized {po = -|-1), 
which we will indicate as Tmax- At r = 0, all three 
networks transmit 1 bit of information. The maximal 
normalized gain (I^ax ~^min)/^max from optimizing the 
initial condition compared to the worst possible initial 
condition the environment can choose has a maximum 
at the readout delay of t « 2.5 (see Fig. 13). At this 
timescale the environment can be most detrimental for 
information transmission. 


IV. DISCUSSION 

Most studies that optimize information transmission 
in biochemical circuits consider ideal conditions and look 
for the networks that are only limited by intrinsic physi- 



(c) 


steady state 


optimal Po 


MaxiMin 


(x, z) (- , -) 

(output,input) j.|i 




(+,-)^(+,+) (+,-) 


1 ' 1 „ 1 ' '! 


A+,+) (+,-)^ 




FIG. 12: Optimal mutual information X* (a) and optimal 
input flipping rate u* (b) when the initial condition Po corre¬ 
sponds to the stationary state (dotted black line), is optimized 
by the system (dashed red line) or is set by an antagonistic 
environment in a maximin game (solid blue line). In panel (c) 
the optimal topology is shown in the three cases: states in red 
are the ones with initial probability Pq = 1/2. Each arrow’s 
thickness is related to the magnitude of the corresponding 
rate at a fixed delay t = 1. 



FIG. 13: Normalized information transmission (P^ax ~ 

Pinin)/Pinax ^s function of the readout delay r*. Optimal P^ax 
corresponds to the case where the system optimizes the initial 
condition Pq, while Pj/i^ corresponds to the MaxlMin solution, 
where the environment chooses the worst possible Pq. 


cal constraints coming from noise in the system. However 
often cells must respond to signals under natural external 
constraints: the readout of the input occurs at a delay, 
cell energetics are limited and the environment may be 
unfavorable - it need not be tuned to the properties of 
the network. Here we investigated how these difficulties 
influence the form of optimal designs of biochemical cir¬ 
cuits. 

Most generally, the information transmitted by circuits 
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decreases with the readout delay, as the system decorre¬ 
lates with time. Feedback can decrease this decays, but 
cannot overcome it completely. In the large dissipation 
limit the optimal solution consist of using a combination 
of fast rates for output switching and slow rates for input 
switching to increase the probability of the system to be 
in two states. Our choice of setting the input rate that 
aligns the input and output states (r = 1) fixed these 
two states to be the aligned states, but the natural sym¬ 
metry of the system implies that a degenerate solution 
that transmits the same amount of information exists for 
the case when the input represses the output, favoring 
the anti-aligned states. We explicitly discussed these so¬ 
lution in the infinite dissipation regime in previous work 
|38| . In the simplest circuit without feedback the only 
way to achieve this separation into favorable and unfavor¬ 
able states is by dissipating energy and forbidding back 
reactions for output switching. Close to equilibrium, in 
the absence of feedback, the circuit cannot constrain the 
back reactions and as a result, the maximum mutual in¬ 
formation goes linearly to zero with the entropy produc¬ 
tion rate & for all values of the readout delay r. From 
the simplest circuit we see that the rate of input flip¬ 
ping depends on the time delay - longer readouts require 
slow flipping rates of the input to be informative, whereas 
the ability to dissipate energy allows the circuit to irre¬ 
versible cycle through the states by eliminating both the 
input and output back reactions. The fully non equi¬ 
librium solution is valid for a large range of dissipation 
values. If long readouts or energy constraints forbid this 
solution the circuit effectively becomes randomly stuck 
in one of two states and not informative: the input is 
fixed with an equal probability to be in one of the two 
states, and the output attempts to align with the input. 
In summary, the only way for a system without feedback 
to transmit information is to dissipate energy. 

Feedback significantly increases the range of dissipa¬ 
tion values at which circuits can be informative. When 
the output feedbacks onto the input, the circuit can 
transmit « 1 bit of information for any value of a even at 
small delays. Far from equilibrium, the optimal solution 
cycles through all the states, effectively increasing the 
decorrelation time of the system. The optimal topology 
is based on a negative feedback motif with a slow switch¬ 
ing input and rapidly responding output. Such motifs 
are very common in stress responses (DNA damage, heat 
and osmotic shock and immune response) [58] and often 
rely on a slow (gene regulation) and fast (protein-protein 
interaction) step. In perfect equilibrium, the formally 
optimal circuit is non responsive - there is no regulation 
and the input and output are aligned at all times. How¬ 
ever at small but finite entropy production rates (as well 
as the suboptimal solution in perfect equilibrium) the op¬ 
timal topologies are different from the large dissipation 
case. 

In the presence of feedback, the optimal circuit in the 
small dissipation range is a positive feedback loop with 
two stable states (-l-,-l-) and (—,—). Such circuits have 


long been known to be a key mechanism for memory stor¬ 
age [59]. This design of a stable switch is able to con¬ 
vert a transient stimulus into a permanent biochemical 
response. These circuits have been shown to be crucial 
for the irreversibility of maturation of Xenopus oocytes 
[60] and for long lasting synaptic plasticity [HT]. It has 
also been argued that positive feedback may have a role 
in enhancing switch-like responses (e. g. in MAP kinase 
cascades) and improving energetic efficiency by filtering 
out noise |62j . This may explain why we find such opti¬ 
mal topology in the small dissipation regime. 

The above examples show that the optimal topology 
at small dissipation rates is characteristic of stable long 
term readouts, that commit the cell to one of two re¬ 
sponses. The aligned (or anti-aligned in the other de¬ 
generate topology) states are very stable and large ener¬ 
getic barriers exist to exit these states, resulting in the 
positive feedback motif being optimal. Conversely, the 
optimal motif in the large dissipation limit is a negative 
feedback loop, that is characteristic of shock response - 
a transient response that is easily exited, but needs to 
be implemented quickly. It is therefore a typical non¬ 
equilibrium response, whereas the positive feedback loop 
is characteristic of slow and stable equilibrium situations. 

Intuitively, dissipating more energy allows for larger in¬ 
formation transmission because it lowers the probability 
of back reactions, which are detrimental when process¬ 
ing a signal. Interestingly, in the presence of feedback 
the system is able to build a particular topology which 
is suboptimal in terms of information transmission but 
which does not dissipate energy at all. The resulting 
network is such that effectively the system can cycle ei¬ 
ther in the clockwise or counterclockwise direction and 
the probability distribution is mostly concentrated on the 
aligned states (-1-,-!-) and (—,—). Such costless network 
topologies could be of inspiration when designing syn¬ 
thetic biochemical circuits aimed at energy production. 


Feedback is able to slow down the decrease of infor¬ 
mation transmission with readout delay, but not change 
the monotonic nature of this process caused by decorre¬ 
lation of the states of the circuit. Yet feedback does alter 
the dependence of the information decay with dissipation 
compared to circuits without feedback. At large as well 
as small dissipation rates the capacity plateaus, leaving 
a small range of a values where the transmitted informa¬ 
tion is sensitive to the precise magnitude of the energy 
constraints. This relatively narrow regime is where the 
optimal motif changes from a positive feedback loop to 
a negative feedback loop. Effectively in this regime the 
feedback is turned off (the back and forth input flipping 
rates are similar) and the circuit resembles the simple 


system discussed in section IIA 


Our optimal network for information transmission with 
large energy dissipation at relatively large readout delays 
(circuit Ef in Fig. has the same design as the two- 
component signaling network in Escherichia coli used in 
osmoregulation [55H55] . This network is composed of the 
histidine kinase EnvZ and the response regulator OmpR 
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and it is aimed at reacting to an osmotic shock by reg¬ 
ulating the expression of two porin proteins OmpF and 
OmpC. After phosphorylation by EnvZ, OmpR under¬ 
goes a conformational change, dimerizes and binds to the 
porin promoter region either of the ompF or ompC gene. 
We can map the activation of our input 0 to the process 
of phosphorylation and conformational change of OmpR 
and the activation of our output x to the dimerization 
and binding to DNA of OmpR. Conversely, the deacti¬ 
vation of X corresponds to unbinding of OmpR from the 
DNA, while the deactivation of z to the dephosphoryla¬ 
tion of OmpR. Detailed experimental studies of the ener¬ 
getics of this system show that phosphorylation-activated 
dimerization drives an increase in DNA binding [66], sug¬ 
gesting that the biochemical regulation is a clockwise cy¬ 
cle such as presented in network Ef of Fig. 


In the large dissipation limit we compared three differ¬ 
ent conditions in which a circuit optimizes the informa¬ 
tion transmitted at a delay for a model without feedback: 
a circuit that functions in steady state (section II A), one 
that is able to optimize its input distribution (derived 
previously [35]), and one that is forced to function with 
the least informative initial distribution (maximin - sec¬ 
tion nil. Interestingly, all solutions share the same cir¬ 


cuit topology and type of solution. The most informative 
solution is to cycle irreversibly through the four states. 
The difference between the three cases lies in the rate 
of flipping the input signal at a given delay. The most 
informative of the three strategies, where the circuit has 
coevolved to match the environmental conditions, dis¬ 
plays the largest flipping rate of the input (although still 
small compared to the flipping rate of the output) that is 
independent of the readout delay. The least informative 
circuit, the one that functions in an adverse environment 
has the slowest flipping rate of the input. Intuitively, 
if the statistics of the environment and circuit match, 
then as long as these initial states are long lived the abil¬ 
ity of the system to transmit information is mainly en¬ 
coded in these states. However, in an adverse environ¬ 
ment, extremely small flipping rates of the input stabi¬ 
lize the initial input states, allowing for a more informa¬ 
tive readout. Since the same circuit, just with different 
flipping rates of the input, works optimally in both fa¬ 
vorable and antagonistic environmental conditions, one 
could imagine that the rate of input switching could be 
tuned depending on the environmental conditions. This 
tuning could be achieved by fast degradation of a ’’typi¬ 
cal” sugar source (like glucose) but a slower degradation 
that requires additional elements (such as production of 
the enzyme beta-galactosidase) for degradation of a less 
typical sugar source (like lactose). 


The models of biochemical regulation we consider as¬ 
sume the limit of very sharp response functions, that sim¬ 
plify their description to two state systems. As was pre¬ 
viously shown, on one hand smooth regulatory functions 
can transmit more than 1 bit of information m, and on 
the other hand the molecular noise coming from discrete 
particle numbers limits the capacity [m m [H ill EBi- 


uni- The capacity and regulatory details of the optimal 
systems can change if we consider more detailed molec¬ 
ular models. However even these simple models show 
general principles of how energy constraints and delayed 
readout drive optimal topologies. It has previously been 
argued using more detailed models that a truly bistable 
system in equilibrium is not optimal for transmitting in¬ 
formation, unless the system does not have time to equi¬ 
librate and manages to retain memory of the initial con¬ 
dition [70] . The solutions we observe in our optimal net¬ 
works with feedback at small dissipation correspond to 
circuits that manage to retain the memory of the initial 
state. 

All the models we considered, both in equilibrium and 
out of equilibrium, corresponded to two component sys¬ 
tems. These types of networks were previously studied as 
circuits that can function out of equilibrium in contrast 
with one component signaling systems that must obey 
detailed balanced [IHET]. When it comes to precision 
of a continuous gradient readout, it was shown that fuel¬ 
ing energy into the system makes it possible to overcome 
the limitations posed by detailed balanced, by decoupling 
the output and receptor molecules and providing a stable 
readout of the input. In our discrete two component sys¬ 
tem, this stable readout of the input state is possible even 
at equilibrium with a circuit design that is able to stably 
store the input state by exploiting timescale separation 
and favoring the aligned states over the non-aligned ones. 
However, such a stable solution is not very useful for re¬ 
sponding to signals that change on fast timescales. In 
that case, energy dissipation is indispensable for an in¬ 
formative readout. 
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Appendix A: General form of the rate matrix £. 


The transition rate matrix C of Equation[^in the main 
text is given in its general form by 


C = 



-Tn 

0 


0 


dr) 


■ (Al) 


When C is analytically diagonalizable, its eigenvalues A^, 
its right eigenvectors Va and its left eigenvectors (with 
a = 1,..., 4) satisfy 


CVa 

— ^a'^a 

(A2) 

ulC 

- Wq, Aq; 

(A3) 


= 

(A4) 


The steady-state probability vector P^o is equal to the 
right eigenvector vi , which corresponds to the null eigen- 
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value Ai and which is given by 

/ dprmrp+dm(dj,rm+Sp(rm+Up)) 


1 


Vi = 


Er=i^i(*) 


Pm TpH-r 

n~\~{dr, 

nSm+rp{s„ 

z-\-Um 

.))Up 

dpTmUr 

n+Sp( 

PmUm^{Sr, 


',)Up) 

Vry^rpU^ 

n-\-{dr, 

nSm+Tp^Sr, 


.))Up 

dm S-rr 

1 {dp-\- 

Sp)+dprp{i 

im-\-U 

p^) 

r^PpUr 

n.~\~{dr, 

aSm+rp{Sp 


. ))'Hp 


and entropy production rate is 

j'd'm^^mdpTryi 

a = J^og- - 


(Bll) 


(A5) Appendix C: Information transmission with energy 
dissipation: the simplest model 


Appendix B: Computation of the entropy 
production rate a 

Here we perform the calculation of the entropy pro¬ 
duction rate cr, for the general dynamic system described 
by the transition rate matrix C and pictured in Fig. [T| 
We start from the definition of cr, introduced in Eq. I^in 
the main text: 

a = ^PiWij log— , (Bl) 

ji 

where Pi is the steady state probability distribution Poo 
for state i. In our specific case, we explicitly have 


D 1 ^12 , n , W24 , 

cr = Piu;i2l0g-|-P 2 W 24 lOg-h 

W2l W42 

, D 1 ^43 , n 1 ^31 , 

-I- PaWaz log-h P3W31 log-h 

W34 Wi3 


P 2 W 21 log 


W 2 I 

W 12 


P 4 W 42 log 


W 42 

W24 


, D 1 ^34 , n , Wl3 

-I- P3tC34 log-h PiWi 3 log-. 

W43 W31 

After collecting similar terms we can write 


7 1 '^12 

cr = J 12 log- 

W 21 


T 1 "^24 

J24 log- 

Wi2 


W43 W31 

J43 log-h Jsilog-, 

W34 Wi3 


(B2) 

(B3) 

(B4) 

(B5) 

(B6) 

(B7) 


where we have used the definition of probability current 
Jij, introduced in Section |T^ in the main text, and we 
have considered the clockwise cycle in Fig. All the 
steady state currents are equal to each other: 


J12 — J24 — Ja 3 — J31 — J 
and the entropy production rate a is 

W12 W24 W43 W31 


a = J log 


W21 Wi3 W34 W42 


(B8) 


(B9) 


We plug the rates of the transition matrix C (see Eq. |Al[ ) 
and the stationary distribution Poo (see Appendix]^ into 
Eq. B9 The current J is 


J = 


Ja P Jb P Jc 


(BIO) 


with 

Ja = 
Jb = 
Jc = 


dp^Tptii^'b'p P Uyn) P P 1 

dm{dp{Tpn p Sjn) P ^m^p P ^m^p P ^m'^p T ^p'^p): 

(Vp A Sp'ji^TpYiUjn P {^Sjn p Um^Up) ^ 


1. Diagonalization of C, 


In the simplest model described in section |II A[ the 
transition rate matrix C has the form 


£ = 


Its eigenvalues Ac are 


/ u-\- s 

—u 

-1 

0 

—u 

u A 1 

0 

—s 

—s 

0 

u A 1 

—u 

\ 0 
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—u 
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its right eigenvectors Va are 


£00 
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V 3 = 
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A i 

i A 2u) 


1 

2(1 

A i 

! — 2u) 


1 

2(1 

A i 
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M A s 


2(1 A s A 2i/) 


and its left eigenvectors are 

'uj = (1,1,1,1), 
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s — u 
s — u , 

u — 1 
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(Cl) 


(C2) 


(C3) 


(C4) 


(C5) 


(C6) 


The stationary state Pqo is equal to the first right eigen¬ 
vector ui, related to the null eigenvalue Ai. 
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2. Limit r <C 1 

Here we detail the solutions of the equations presented 


in the small r limit of the simplest model in section IIA 
In the small dissipation limit we assume 


7 ~-h oo + cqt 

T 


(C7) 


and find the coefficients of the exp ansion by solving 
= 0 (with fjL given by Eq. 19) order by order in 
T. The coefficient oq = 0.96... is given by the solution 
of the transcendental equation = |(ao + 1 ), while 6 o 
and Co are given respectively by 

—5 — 2ao 


bt) = I 


6 ag 


= —0.25... and 


—75 — 65ao + 128ao + 28ao 
= -72^1- 

Using the Eq. |C7| for 7 * gives 

/i* ~ ■\/d(l + Aqt + Bqt^), 


(C 8 ) 


with 


An — 


- 4 e-ao 3 
2 ao 


— 1 = —0.24... and 


D o —oq ~5 —7aQ+aQ+6ao , 10+4aQ —g^ —12aQ+4aQ O Ol 

£jQ o 2 aQ ' U.Ul. 




Similarly in the large dissipation limit we take 7 * ~ 
^ + boo + CooT and following the same procedure as 
above with p given by Eq. |^we find Coo = 1.68... as the 
solution of the transcendental equation e““ = 2(aoo +1), 
while boo and Coo are given by 


boo — 1 


2 do 


= —0.31... and 


Coo - 


— 12 — 12aoo + 


3ai, 


2aY 


= 0.07... 


The effective magnetization is 

P* — 1 + AooT + BooT^ , 

with 


(C9) 


Aoo — 


Boo = 


- 2 e- 


1 


— 1 = —0.63... and 


^ (—4—6goo+2g^)4-24-aoo —g^) 


= 0.23... 


3. Limit & 

In the limit of ct <?; 1 we assume that p ~ c( 7 ,T)-\/d 
and s = 1 — e, generalizing the ct = 0 behavior. Solving 
Eq. 25 for e we obtain the form of 0 ( 7 , t) in Eq. 27 For 


each value of t the function 0 ( 7 , r), has a single maximum 


in 7 *, which is a decreasing function of r and satisfies the 
transcendental equation 

,(V-.)., P+7-+27-^ + 2Vr(V=-l) ^ (CIO) 

(1 + 7*)(1 + 37*) ^ 

The maximum of 0 ( 7 , r) is found from dc/d'y ^ 
F( ^, t)/ (j — 1 )^ = 0 , where ^( 7 , r) = 0 is solved by 
Eq. CIO for 7 > 1. In the limit 7 —>■ 1“'", the maximum 
is found as the solution of e~'^ (l + 2r — 4 t^) /(4-\/2) = 0, 
and at 


l + t/5 


the maximum of c^jyT) reaches 7 * = I. 


(CIl) 


4. Limit d 1 

In the limit of large dissipation a and delay t <C 1 it 
is possible to write the optimal mutual information as 


I* ~ 1 + ralog 2 (to/c) 


where d = 0.31... is defined as 
a = 


‘2{doo + 1 ) 

with doo = 1.68... introduced in Appendix |C 2 


(C12) 


(C13) 


Appendix D: Information transmission with energy 
dissipation: feedback 

1. Diagonalization of £. 

In the model where feedback is present (see section 


IIBI, the transition rate matrix C has the form 


C = 


s + a 

-y 

-1 

0 

—a 

l + y 

0 

—s 

—s 

0 

l + y 

—a 

0 

-1 

-y 

s + a 


(Dl) 


It is useful to introduce the quantities A and p, defined 
in the main text as 

A = I + s + p + a (D2) 

p = \/(l + s + y + a)2 - 8 (s?/ + a). (D3) 

Then the eigenvalues Aq, can be written as 


(D4) 



— io{A -\- p) 
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The right eigenvectors Va are given by 


^--^1 - 2 ^ 


vi = 



^3 = 

4p 


V4 = 


/ +(-l + s - y + a - p) 
-t-2(s — ct) 
-2(s-a) 

\ -(-1 + s-y + a-p) 

( +(1 -s + y-a-p) 
+2(—s + a) 
—2(—s + a) 

-(1 - s + y-a-p) 


and the left eigenvectors Ua are 


T 


= ( 1 , 1 , 1 , 1 ) 


^ /I _ i±y _l±y I'l 
2 V , s-l-a ’ s+CK ’ / 


,,T ^ 2(l-y) 

2 ( 1 -;/) 


2(i-y) 


l—s+y—a+p ’ 

,/r _ I _1 ^y^-V) 2(l-y) 

4 \ , 1 —s+y—a —p’ 1 —s+y—a —p’ 


(D5) 


(D6) 


(D7) 


(D8) 


(D9) 


The stationary state Poo is equal to the first right eigen¬ 
vector vi, related to the null eigenvalue Ai. 


D31 and introduce the additional quantities 
1 -|- y — s — a 

<? = 


2A 


(D12) 

(D13) 


We rewrite the right eigenvectors of Appendix |D 1| as 

v^(x,z) = (D14) 

V 2 {x,z) = mxz (D15) 

V3(x,z) = - i+^-|fe+°+P a- i+!/+.-3^+P ^ (015) 

V4(x,z) = ^+y-l>’+-‘-P x+ (D17) 

and define 

h = (D18) 

s -I- a 

a = ^(1 + s - 3y + a - p), (D19) 

b = -{—3 + s + y + a — p), (D20) 

c = 1 — s-|-y — a-l-p, (D21) 

e = ^{^ + s-5y + a +p), (D22) 

/ = + s + y + a + p)^ (D23) 

g = 1 — s-|-y — Of — p. (D24) 


2. Computing p 

In order to compute the effective magnetization p in 
the presence of feedback, we recall Eq. which is valid 
in general and which relates the joint probability distri¬ 
bution P{xt,zo) with p. We then express P{xt,zo) as in 
Eqj^in the main text: 

P{xt,zo)= ^ P{xt,zt,t\xo,zo,0)Po{xo,zo), 

Zt,Xo=±l 

(DIO) 

where Po(xq, Zq) is the initial distribution of the system, 
corresponding to the stationary state Poo = vi{xo,zo), 
while the conditional probability P{xt, Zt, t\xo, zq, 0) can 
be written as 

4 

P{xt, Zt, t\xo, Zq, 0) = ^ (a;o, zo)vi{xt, Zt). 

(Dll) 

Vi denotes the i-th right eigenvector and uf - the i-ih. 
left eigenvector and we make the dependence on x and 
z explicit as we are going to exploit it in the subsequent 
algebraic manipulations. 

We recall the definitions of A and p (Equations |D2| and 


Having done that, the left eigenvectors of Appendix |D 1| 
now read 


Ui{x,z) = 

1, 

1 — h 1 + h 

(D25) 

U2{X,Z) = 

2 + 2 *-’■ 

(D26) 

uli.x,z) = 

ax -I- bz 

(D27) 

c 

II 

TT 

ex + fz 

(D28) 


Now, by plugging Eq. |D11| into Eq. |D10[ we are able to 
write P{xt, ^o) as 

4 

P{xt, Zq) = ^ e~^'^Ai{zo)B,{xt), (D29) 

where 

Ai{zo) = uJ{xo,zo)vi(xo,zo), (D30) 

Xo±l 

B^{xt) = '^Vi{xt,Zt). (D31) 

Zt±l 
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Computing the terms Ai and Bi (with i = 1,..., 4) we 
obtain: 


and 


^l(^o) 

= 1/2, 

(D32) 

A2{zo) 

= \ {^-h+{l + h)qz^) , 

(D33) 

Asizo) 

_ (6 -b aq)zo 

2c 

(D34) 

A4{zo) 

if + eq)zo 

2g ’ 

(D35) 

Bi{xt) 

= 1/2, 

(D36) 

B2[xt) 

= 0, 

(D37) 

Bsixt) 

Xt{l — 3s + y + a + p) 

Ap 

(D38) 

Biixt) 

xt{l - 3s + y A- a - p) 

Ap 

(D39) 


Plugging in all the above expressions into Eq. |D29| we 
compute the effective magnetization /i, which is 


fjL = exp 





(1 + j/)^ — da + + 2s{2y + a)] 

Ap 



(D40) 


3. Numerical results: optimal rates 


FIG. 14: Optimal rates as functions of rescaled 

dissipation a for different readout delays r. Results from the 
simulation branch with y > a and with y < a are shown in 
panels (a-c) and (d-f), respectively. These rates are used to 
compute the optimal mutual information I* of Fig. |10^ -c in 
the main text. 


In this section we show the optimal rates 
resulting from numerical optimization. As discussed in 
the main text, optimization is performed as two separates 
branches: one where we fix y > a, and the other where 
we set y < a. Results from both branches are shown in 


Figs. 14 an d[T^ 

In the Fig. |14[ we show the optimal rates as functions 
of rescaled dissipation a, for different values of delay r. 
Such rates are used to calculate the optimal mutual in¬ 


formation shown in Fig. lOi-c in the main text. 

In Fig. [T^ we show the dependency of the optimal rates t, 
for different values of a. These corresponds to I* shown 
in Fig. [TM-f in the main text. 


Appendix E: Robust optimization 
1. Diagonalization of £ 

In the maximin model described in section m the 
transition rate matrix C has the form 


C = 


Its eigenvalues Aq, are 



Its right eigenvectors Va are 


Poo - - 


V2 = 


V3 = 


1 


M -b 1 
U 

2(1 + 2u) \ u 
M -b 1 


2(1-2u) -1 

-1 


1 


2(1 - 2u) 



/ 

u 

—u 

-1 





—u 

1 -b M 

0 


u 

-1 


0 

0 

1 -b u 

^ <“) 

“ 2(1-b2u) 

-1 

V 

0 

-1 

—u 

u ) 


V+I/ 


(E2) 


(E3) 


(E4) 


(E5) 
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FIG. 15: Optimal rates {s*, i/*, a*} as functions of the read¬ 
out delay r for different values of the entropy production rate 
a (measnred in bits). Resnlts from the simulation branch with 
y > a and with y < a are shown in panels (a-c) and (d-f), 
respectively. These rates are used to compute the optimal 
mutual information X* of Fig. |10[i-f in the main text. 


and its left eigenvectors uj are 


When r —0, in order for the left hand side (l.h.s.) of 


Eq. E7 to match the leading order term l/r of the right 
hand side (r.h.s.) of Eq. E7 , a must be of the form 


a = — log T + b. 


(E8) 


Hence Eq. |E7| becomes 

gb I 

— = !-!-& — log r-|—2(6 — logr)^. 

T r 

Multiplying both sides by t gives 

= (1 -I- 6 — logr)r -h 26^ — 46log r -h 2(logr)^. 

The leading order term of the r.h.s. for r —)■ 0 is 2(logT)^, 
thus the above equation becomes 

e'* ~ 2(logT)^, T^O, 

which implies that 6 has the form 

6 = log(2(log r)^) + c. (E9) 

Plugging b into Equation |E9| we obtain 

(2(logr)2)e'= ~ 2(logT)2-4(log(2(logr)2)-hc)logT-h 
-h 2(log(2(logr)2)-hc)2-h.... (ElO) 

and by 2(logr)^, we have 

C 1 olOg(2(10g'^)^) +C , _3 log(2(logx)2) 

e = 1 — 2-;-h • • • — e , 

logr 


[ul = ( 1 , 1 , 1 , 1 ), 

I = (-1,1, -1,1): 


(E6) 


The stationary state Poo is equal to the first right eigen¬ 
vector Ul, related to the null eigenvalue Ai. 


2. Solution for r <C 1 


which finally implies that 


c= -2 


log(2(logr)^) 

logr 


(Ell) 


To sum up, when r <§C 1 one can write a as 

a ~-logT-blog(2(logr)2) - 2i2il2fe|;Ii 
= log ^ 2(logT)^ ^ _ 2 log(2(logT)^ 


(E12) 


In this section we derive the asymptotic behavior of 
u* as r —>■ 0 given in Eq. 64 In general, Eq. results 
in a transcendental equation for the auxiliary variable 
a = rl^: 


e“ = 1 -b a ■ 


2a^ 

T 


(E7) 


which needs to be solved numerically. However, in the 


limit T <C 1 we can analytically solve Eq. E7 


When T —>• 0, a diverges as 

r«l (E13) 

\ T J logr 

and u* goes to zero with r in a strongly sublinear way: 

* ^ __ r __ ^ 

2(-logr-blog(2(logT)2) -2 i°gl2(^‘°s^)") )’ 

(E14) 
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